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Abstract: There are described methods used to minimize data files dimension. There are 
defined indicators for measuring size of files and databases. The storage optimization process 
is based on selecting from a multitude of data storage models the one that satisfies the 
propose problem objective, maximization or minimization of the optimum criterion that is 
mapped on the size of used disk memory. The paper describes different solutions that are 
implemented to minimize input/output file size for a software application that manages 
educational system data. 

Key words: file, database, optimization, data, educational system. 

1. File and database dimension 

It is considered a finite set of N elements, El,, Elj, ..., El^,. It is defined an optimum 
criterion and it is the El, element that maximize/minimize the function associated to the 
optimum criterion. This defines the optimization problem in the informatics field, the 
framework being applied to any software quality characteristic that is included in the 
optimization process. 

There is considered a collectivity C composed from the elements, c,, C2, ..., c^, 
where N represents the total number of elements. Each element c, it is described using M 
characteristics. A,, Aj, ..., A/^. 

For each of the software characteristics A| there are used values or attributes to 
describe measured levels of c, elements. The values or attributes are described using arrays 
of characters or strings. As a result, the Sjj characters string describes the levels of the A| 
characteristic for the c, element. 

The S;j string is characterized by the Lij length which is represented by a number of 

symbols. 

The problem of storing data into files or conventional databases suppose using 
homogenous data structures for each of the collectivity articles. To characterize the required 
memory space there are defined a series of indicators that will measure this dimension and 
that will provide a quantitative approach of the problem. 

In order to determine the memory space reserved by a software application for its 
data, there are accomplished the next steps; 



81 



Education in the Romanian Information Society 
in the Period Leading Up to EU Integration 


JOURNAl 

OF 

APPLIED 

QUANTITATIVE 

METHODS 


there are recorded into a table with n lines and m columns the descriptions of C 
collectivity elements; 

for each characteristic it is selected the element that has the maximum length; 


=max,„„ 

““ i<;<v ^ 


k) 


it is constructed the structure used to describe the characteristic elements; its form is 

struct struc 

{ 

type comp! ; 
type comp2; 


type compj; 

type compM 

} 

and it is defined the indicator LG(type compj) = L'^^x- 
it is obtained the database with fixed length articles, BDF. 

In the database, for each element of the C collectivity it recorded an article with the 

M 

dimension equal with = X^max ■ 

>=i 

For the students collectivity STUD, described in table 1 , there are measured fields 
length, maximum dimensions and based on that it is determined the article size. 


Table 1. Description of students colectivity 


No. 

Name 

First 

name 

Height 

Gender 

City 

Age 

Date of 
Birth 

School 

1 

Anghelachejio) 

loripi 

132,3, 

Male, 4 , 

Bucharest, 9 ) 

12,2, 

24/11/93,8, 

173,3, 

2 

Buiorpi 

Elenois) 

126,3, 

Female, 4 ) 

lasi,4) 

12,2, 

12/07/93,8, 

10,2, 

3 

Biteanupi 

Cristiari|g) 

125,3, 

Male, 4 , 

Ploiesti, 3 , 

10,2, 

14/04/95,8, 

154,3, 

4 

CretU(5) 

lotipi 

132,3, 

Male,4) 

Bucharest, 9 ) 

12,2, 

06/05/93,8, 

3(1, 

5 

Cretu ( 5 ) 

Roxana 14 ) 

137,3, 

Female, 4 ) 

Bucharest, 9 , 

14,2, 

27/05/91,8, 

189,3, 

6 

Danciulescu|,,) 

Mihai(5) 

137,3, 

Male,4) 

Ploiesti, 3 , 

14,2, 

16/07/91,8, 

56|2) 

7 

Danciulescu|,,) 

lonpi 

135,3, 

Male, 4 , 

Bucharest, 9 ) 

14,2, 

19/07/91,8, 

1 33,3) 

8 

Ene,3) 

Catalinp) 

126,3, 

Male,4) 

Ploiesti, 3 | 

10,2, 

05/03/95,8, 

43(2, 

9 

lonescup) 

lrina(5) 

131,3, 

Female(4) 

Bucharest, 9 ) 

1 1,2, 

22/06/94,8, 

17(2, 

10 

lonescup) 

Catalinp) 

128,3, 

Male,4) 

lasi, 4 , 

12,2, 

11/02/93,8, 

23(2, 

SUM 

71 

52 

30 

46 

69 

20 

80 

23 

TOTAL = 391 


In the parentheses there is described the dimension of data values as number of 
characters. 

Table 2. Fields length for students collectivity. 



Nome 

First name 

Height 

Gender 

City 

Age 

Date of Birth 

School 

max 

1 1 

8 

3 

6 

9 

2 

8 

3 

Lar, = 50 
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The database length, Lbd(STUD), is determined with the relation Lbd = N * For 
the students database the size is Lbd(STUD) = 500 bytes. 

It is observed that some of the article fields are smaller than what is defined as 
maximum length. As presented next in this paper, this fact will increase the database size 
and will highlight an inefficient data storage solution from the memory space viewpoint. 


For the considered example, the STUD database, the indicators value are Gy = 
391/500 = 0,78 and Gnu = 0,22. These values are used as a comparative base to evaluate 


2. Optimization of used memory size 

For the first variant, it is considered a separator character, a marker used to 
indicate the end of a array of characters, as "\0" in C/C++ or other programming 


There are concatenated the s\j strings, that are used to described the collectivity 

elements, C| with i = 1 ..N. The length indicator that measures the dimension of the database 
element has the relation. 


data form: 

Anghelache(,o)#lon(3)#l 32(3)#Male(4)#Bucharest(9)#12(2)#24/l 1/93(b)#1 73(3)#Bujor(5| 
#Elena(5)#l 26(3)#Female(6)#lasi(4)#12(2)#1 2/07/93 (b)#1 0(2)... 

For this data storage variant, the database dimension is given by the total number 


The memory use, or efficiency, degree is determined by the relation: 


N M 



1=1 j=l 


The memory non-use, or inefficiency, degree is determined with the relations 





‘BD 


the efficiency of proposed solutions from the view point of reserved memory space. 


languages. This symbol it is noted with a . The S;j string to which is appended this string end 
marker becomes s';j. 


s.j \\a = s\j; lgis\j) = lg(s.j) + l 


M 





The memory use efficiency degree of this database format is: 


^BD ^BD 

In the case of the students STUD database, applying this solution will conduct to the 



of article characters to which is added the number of bytes reserved for the string end 
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markers. The length of the first article of table 1 is =42 + 7 = 39 , where 42 represents 

the number of characters contained in the article. 

Values of previous defined indicators are Lbd = 391 + 70 = 461 bytes, = 
70/461 = 0,15 and Gy - 0,85. 

To optimize means to find the modality used to construct a database that has a 
dimension smaller than other databases of same collectivity, but based on a different data 
storage technique. 

For this solution, there must be taken into account the particular situations that will 
conduct to worse results. These cases are described by the existence of a data set in which 

every article size is equal with the maximum dimension. If lg(S|j) = cu i = l,2, ..., N and 

j = l,2, ..., M it results that Ig(s'ij) = ^max+^ database BD' has a dimension equal 

with Lbd' = Lbd + M * N. 

For a database with ten articles that have eight fields and = 50, applying 


these solution will generate a L 


BD°' 


degree Gp is given by the relation = 


= 500 + 70 = 570 bytes database. The overuse 
1 . For the analyzed situation, the indicator 


value is Gq = 0,14. Based on this result, it is concluded that in this particular case, the 
storage variant will generate a database with a 14% increased memory size. 

The second variant uses data conversions and compressions that will reduce the 
database length. Numerical values represented in the database by characters arrays are 
converted, representing them in binary integer or floating format. For example, the values 
that describe the student height, will necessitate one byte if there are saved in numerical 
format as unsigned integers. 

For the table 1 data, the internal binary format to be associated to fields values is 
determined based on the variable maximum value and on the fundamental data types 
defined by the programming language used to develop the software application. Choosing 
C/C++ as programming medium, the numerical fields of the stud structure will require the 
memory space described in table 3. 


Table 3. Memory space reserved by article numerical fields 


Field; 

Height 

Age 

Date of Birth 

School 

Dimension; 

1 byte 

1 byte 

3 bytes 

1 byte 

C/C+ + used data type 

unsigned int 

unsigned int 

structure of 3 unsigned int 

unsigned int 


By storing numerical data, using binary format, it is obtaining a minimization in 
memory size. Base don that, it results an article which contains: 
end mark fields as fieldl , field2, field4 and field5; 

fields with standard imposed by conversion length as field3, field6, field/ and fields. 
The length of the compressed database BD" is 

L,o-=|;iU+*:*A' + 'V 

i=l 
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where k represents the number of fields that have end separator. For the others N-k fields, 
through compression/conversion there have been obtained constant lengths. Also, it will be 


In the end, it is obtained the total length of 298 bytes for all 10 records and the 
database dimension is =298 + 4*10 + 10 = 348, because k = 4 fields have string 
markers. 

For this data storage variant, the degree of space use efficiency has the value 


where M represents the size of the marker, in this case equal with one. 

The solution proposed in previous variants is improved by the third variant by 


implementation of the solution impose a series of restrictive conditions that will the base of 
used data model. 

It is considered the structure art that combines into a single article all the data 
needed to process the entity. Its format is: 


In order to store data and minimize reserved memory space, it is implemented a 
method to arrange the fields in a way in which two adjacent fields campi and campi+, does 
not have same type, tip- A cu i = h.s-1 . The situation allows the elimination of filed 

end markers because the cross from one data type to another one is announced by the 
different internal format. 


manages the database described in table 1 . The difference between recorded data types 
allows the use of current data storage variant, obtaining the article: 

stud { Name; Height; First_name; Age; Gender; Date_of_Birth; City; School;} 


reduced to =10 + 1 + 3 + 1 + 4 + 3 + 9+1 = 32. 

It is observed that the fields dimension it is not modified from the previous solution 
and it is obtained the total length of 298 bytes for all ten articles. The memory space 
reserved for the entire database is =298 + 10 = 308 bytes. The reduced size is the 
result of using only the article end markers. 


used a marker to indicate the end of an article. For the table 1 example, the first article will 
be saved in the form 


Anghelache(,o)#lon(3)#Male(4|#Bucharest(9)#1 32(,)12|,j24/l 1/93(3)1 73(,j# 


and it has the dimension equal with L'art = 32+#(5) = 37 bytes. 


G, 


u 



= 0,85 


defining a method that will not use end markers. The working context and the 


art { tip, camp,; tip 2 campj; ...; tipj camp^;} 


For this approach, the size of a database that contains nart articles of this type, is 


nart s 


determined by the indicator , in which S;j represents the length of the ; field 





Implementing this method, the first article of the database has its dimension 
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For this data storage version, the indicator used to measure the efficiency of space 
utilization has the value G„ = 0,96 , most of the bytes representing data used in the 
processing activity. 

In fourth variant, it is considered a vocabulary Vj that contains the set of distinct 
values of the Cj collectivity elements. 

The Vj set is described by the elements V| = {V|,, Vj 2 , ..., Vj^ }, where Vji, V |2 are words 
from the V| vocabulary and lg(v||) describes the length of the V|| word. 

Any array of characters S;| that represents the value of the / field of i article exists in 
the collectivity vocabulary, V|. 

The supposition based on which is implemented this solution requires a the 
presence of a large number of data and a limited number vocabulary. The greater repeating 
degree of values means an increase efficiency of the method. 

Each vocabulary word occupies a fixed position. The new form of the article will 
contains the value position in vocabulary, replacing the characters array by a number. 

The steps required for a proper application of the method are; 
it is defined the vocabulary V,, V 2 , ..., for the all M characteristics used to describe 
collectivity elements; 

the vocabularies are stored in a particular database BDV that the length equal with 


If it is defined that all the positions are represented by a field with length equal with 
Lyoc' collectivity database length is 


For the table 1 example it is defined the common vocabulary 
W = { Anghelache Bujor’^’, Biteanu*^’, Cretu^'*’, Danciulescu Ene^'^^ lonescu*^’. Ion*®’, 
Elena*”, Cristian *'°’, Roxana*"’, Mihai*^”, Catalin*'®’, Irina*'"*’, Male*'®’, Female*'®’, 

Bucharest*'”, Iasi*'®’, Ploiesti*'”} 

In parentheses are defined the values positions in the W vocabulary. If it is 
considered the maximum length =11 for all the W vocabulary values then Lg(BDV) = 
19*11 = 209 bytes. 

The positions required one byte, = 1 , so the size of the database article if 
given by the relation 


M 


Lg(BDV) = lg(V,) + lg(V 2 ) + ...+ Ig(VM) = ^ W) 


k=l 


it is developed the collectivity database, BDC, using values positions from the 


vocabulary 


N M 


Lg(BDC)= Y.H^S(Poz,j) 


i=i j=i 

where Poz,i is the field that represents the value vocabulary position for the C| element and Vj 


vocabulary. 


Lg(BDC) = M*N*L;„,. 


nc 



l=l 


where 


nc - number of article fields that have the initial format; if these fields have 
variable length then it is used an end marker to separate them; 
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s';j - the string value with the end marker; 

k - number of fields that are replaced by their position in the vocabulary; 

- the length of the position field. 

The size of the compressed database is determined by the indicator 

L(BDC) = 2 c, +L(BDV) 

1=1 

For the considered example it is obtained: 

L(BDC) = (6 + 4*iLi + (6 + 4*ll„, + ... + (6 + 4*lL,„+ 209 = 100 + 209 = 309 bytes. 

This solution is more improved by minimizing the vocabulary dimension, because its 
efficiency is directly dependent by the maximum size of vocabulary values and also by their 
medium size. Because of the elements length variation, the implementation of a fixed size 
structure will results in a waste of memory space. The use of elements end markers will 
reduce the reserved space. 

Implementing the '#' marker it is defined a vocabulary with the size equal with 
L(BDV) = 118 + 19*1 =137 bytes. In this case the database lengths becomes L(BDC) = 
100 + 137 = 237 bytes. 

3. Selecting optimization method 

There are considered optimization methods M,, Mj, ..., M, to which are associated 
modules into a software application intended to optimize educational data storage. 

A file F represents the entry data for the considered application. 

The result of data processing activity consists in obtaining the files E,, E 2 , ..., E„ that 
are created by correct optimization modules. The relation between modules and methods is 
one to one. There are determined the indicators LG(E,), LG(E 2 ), ..., LG(E,). To optimize 
storage files in a automate manner is equivalent to implementing in the software application 
a module that will select LG^m = min{ LG(E,), LG(E 2 ), ..., LG(E,)} = LG(E|J. Based on that, it 
results that the M|, storage methods is the most efficient and it is the method that will be 
implemented in the final version of the product. 

The software application is developed in C programming language and it 
implements storage techniques previous described. 

It is defined the data structure needed to store data regarding the high school 
students database. It is considered the example described in table 4. 


Table 4. Students database 


No. 

Name 

First name 

PNC 

Height 

Weight 

School 

City 

1 

Alexandrescu 

lonela 

2... 

145 

47 

175 

Bucharest 

2 

Bratescu 

Catalin 

1... 

139 

50 

175 

Buftea 

3 

Constantin 

Adrian 

1... 

145 

50 

160 

Mihailesti 

4 

Constantin 

Mihai 

1... 

135 

47 

163 

Bucharest 

5 

Gheorghe 

Florin 

1... 

137 

49 

179 

Bucharest 

6 

lonescu 

Gabriela 

2... 

139 

44 

3 

Bucharest 

7 

lonescu 

Adrian 

1... 

132 

50 

175 

Bucharest 

8 

Popescu 

Adrian 

1... 

135 

48 

173 

Otopeni 

9 

Popescu 

Alina 

2... 

139 

41 

160 

Bucharest 

10 

Zamfir 

Ion 

1... 

135 

50 

3 

Buftea 
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The methods used to store table 4 data are: 

a solution with high use degree in real applications and with a low complicity level is 
given by the definition of a data structure; this is associated with each of the 
database articles; the file saving operation is made without auxiliary data 
processing; the data structure used to memorize students data is 


struct stud 

{ 

char nume[1 3]; 
char prenume[9]; 
char cnp[1 4]; 

unsigned short int inaltime; 
unsigned char greutate; 
unsigned short int scoala; 
char localitate[1 1]; 


the dimension of the stud article is 52 bytes; the dimension of the database that has 
a normal form by saving the articles in the output file is LG(BDF) = 520 bytes; the 
cod sequence that writes the data in the file is: 


void salvareDate(FILE *pfisier, stud *listaStud, int dim) 

{ 

if(pfisier){ 

for(int i=0;i<10;i + +){ 

fwrite(&listaStud[i],sizeof(stud),l ,pfisier); 

} 

} 

} 

the data are written in the file using the delimiter marker'#' in order to separate the 
articles fields; this solution is described by the first version of the storage methods; 
the numerical values are converted into char arrays before writing them into the file; 
it is obtained the BD^eparator database and its dimension is LG(BDsepara,or) = 504 bytes; 
the internal routine used to save data with the corresponding format is 

void transformare1_OUT(stud *listaStud, int dim) 

{ 

FILE *pfisOUT = fopen("DateTEST.txt'',''wb"); 

fwrite(&dim,sizeof(int),1 ,pfisOUT); 

for(int k=0;k<dim;k++){ 

unsigned int j; 

char *rez; 

char inaltime[3]; 

char greutate[2]; 

char scoala[3]; 

_itoa(listaStud[k]. inaltime, inaltime,! 0); 

_itoa(listaStud[k].greutate,greutate,1 0); 

_itoa(listaStud [k] .scoala,scoala, 1 0); 

int dim_Articol = strlen(inaltime) + strlen(greutate) + strlen(scoala) + 
strlen(listaStud[k].nume) + strlen(listaStud[k].prenume) + 
strlen(listaStud[k].localitate) + strlen(listaStud[k].cnp); 
rez = new char[dim_Articol + 7]; 
int i = 0; 

for(j=0;j<strlen(listaStud[k].nume);j + + ,i + +) 
rez[i] = listaStud[k].nume[j); 
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rez[i]='#'; 
i + + ; 

for(j=0;j<strlen(listaStud[k].prenume);j + + ,i + +) 
rez[i] = listaStud[k].prenume[j]; 
rez[i]='#'; 
i + + ; 

for(j=0;j<strlen(listaStud[k].cnp);j+ +,i + +) 
rez[i] = listaStud[k].cnp[j]; 
rez[i]='#'; 
i + + ; 

for(j=0;j<strlen(inaltime);j + + ,i+ +) 
rez[i] = inaltime[j]; 
rez[i]='#'; 
i + + ; 

for(j=0;j<strlen(greutate);j+ +,i + +) 
rez[i]=greutate[j]; 
rez[i]='#'; 
i + + ; 

for(j = 0;j<strlen(scoala);j + + ,i + +) 
rez[i]=scoala[j]; 
rez[i]='#'; 
i + + ; 

for(j=0;j<strlen(listaStud[k].localitate);j + + ,i + +) 
rez[i] = listaStud[k].localitate[j]; 
rez[i]='#'; 
i + + ; 

fwrite(rez,sizeof(char),dim_Articol + 7,pfisOUT); 
delete rez; 

} 

fclose(pfisOUT); 

} 

data are written in the output file using the character marker '#' to separate string 
values of the stud article; numerical data are stored using their binary internal 
format; this solution represents the implementation of the second storage version; 
the obtained database, has the dimension LG(BD„u„,e,|J = 448 bytes; the 

subprogram used to write the data is 

void transformare2_OUT(stud *listaStud, int dim) 

{ 

FILE *pfisOUT = fopen("DateTEST2.txt","wb"); 
fwrite (&d i m ,sizeof (i nt) , 1 , pf isO UT) ; 

for(int k=0;k<dim;k+ +) 

{ 

unsigned int j; 
char *rez; 

int dim_Articol = strlen(listaStud[k].nume) + strlen(listaStud[k].prenume) 

+ strlen(listaStud[k].localitate) + strlen(listaStud[k].cnp); 
rez = new char[dim_Articol+4]; 

int i = 0; 

for(j=0;j<strlen(listaStud[k].nume);j + + ,i + +) 
rez[i] = listaStud[k].nume[j]; 
rez[i]='#'; 
i + + ; 

for(j=0;j<strlen(listaStud[k].prenume);j + + ,i + +) 
rez[i] = listaStud[k].prenume[j]; 
rez[i]='#'; 
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i + + ; 

for(j=0;j<strlen(listaStud[k].cnp);j+ +,i + +) 
rez[i] = listaStud[k] .cnp[j]; 
rez[i]='#'; 
i++; 

for(j=0;j<strlen(listaStud[k].localitate);j + + ,i + +) 
rez[i] = I istaStud [k] . loca litate[j] ; 
rez[i]='#'; 
i + + ; 

fwrite(rez,sizeof(char),dim_Articol+4,pfisOUT); 

fwrite(&listaStud[k].inaltime,sizeof(unsigned short int),1 ,pfisOUT); 
fwrite(&listaStud[k].greutate,sizeof(unsigned char),! ,pfisOUT); 
fwrite(&listaStud[k].scoala,sizeof(unsigned short int),1 ,pfisOUT); 

delete rez; 

} 

fclose(pfisOUT); 

} 

data are stored without using separator markers between article fields because the 
structure of the stud article allows the relocation of a numeric field between two 
string fields; despite the low disk space of the resulting output file, the solution given 
by the third variant must be modified in practice in order to allow the placement of 
the marker '#' after each numeric value; this will reduce the effort to write code 
sequences used to identify inside the file the limit between a string value and a 
numeric one; the resulting database BD,,„,„binat/ formed without using the marker has 
the dimension LG(BD,.„,^ynat) = 418 bytes, and the data saving routine is 


void transformare3_OUT(stud *listaStud, int dim) 

{ 

FILE *pfisOUT = fopen("DateTEST3.txt","wb"); 
fwrite(&dim,sizeof(int),1 ,pfisOUT); 
char StudentEnd = '#'; 
for(int k=0;k<dim;k+ +) 

{ 

fwrite(&listaStud[k].nume,strlen(listaStud[k].nume),l ,pfisOUT); 
fwrite(&listaStud[k].inaltime,sizeof(unsigned short int),1 ,pfisOUT); 
fwrite(&listaStud[k].prenume,strlen(listaStud[k].prenume),l ,pfisOUT); 
fwrite(&listaStud[k].greutote,sizeof(unsigned char),1,pfisOUT); 
fwrite(&listaStud[k].cnp,strlen(listaStud[k].cnp),1 ,pfisOUT); 
fwrite(&listoStud[k].scoala,sizeof(unsigned short int),1 ,pfisOUT); 
fwrite(&listaStud[k].localitate,strlen(listaStud[k].localitate),1 ,pfisOUT); 
fwrite(&StudentEnd,sizeof(chor),1 ,pfisOUT); 

} 

fclose(pfisOUT); 

} 

for this solution it is not taken into discussion the reverse operation, used to read 
data from file; 

data are saved into the file using a symbol vocabulary that contains the distinct string 
values of article fields; in order to minimize the vocabulary dimension, its elements 
are separated by the '#' marker; inside the database, these values are replaced by 
their vocabulary position; the new data structured for the stud article is in this case 
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struct pozvocabular 

{ 

unsigned char poznume; 
unsigned char pozprenume; 
unsigned char pozcnp; 
unsigned short int inaltime; 
unsigned char greutate; 
unsigned short int scoala; 
unsigned char pozloc; 


the new database dimension is and it is obtained by summing the 

vocabulary dimension and the values zone length, LG(BD„o,.abuiar) = 299 + 124 = 423 
bytes; because the example dataset has reduced size, it is not highlighted this 
solution efficiency; the code sequence used to convert the database from the normal 
form to the current one is 


void transformare4_OUT(stud *listaStud, int dim) 

{ 

vocabular *Vocabular = NULL; 
vocabular *VocabularEnd = NULL; 
int flag = 0; 

FILE *pfisOUT = fopen("DateTEST4.txt","wb"); 
fwrite(&dim,sizeof(int),1 ,pfisOUT); 

// se construieste vocabularul 
pozvocabular elemCurent; 
int elemDictionar = 0; 

for(int k=0;k<dim;k+ +) 

{ 

elemCurent.greutate=listaStud[k].greutate; 
elemCurent. inaltime=listaStud[k]. inaltime; 
elemCurent. scoala = listaStud[k]. scoala; 
flag = lslnVocabular(listaStud[k].nume,Vocabular); 
if(flag = = -1) 

{ 

AddVocabular(listaStud[k].nume,Vocabular, VocabularEnd); 
elemCurent. poznume=elemDictionar; 
elemDictionar+ + ; 

} 

else 

elemCurent. poznume=flag; 

flag = lslnVocabular(listaStud[k].prenume,Vocabular); 
if(flag = = -1) 

{ 

AddVocabular(listaStud[k].prenume,Vocabular, VocabularEnd); 
elemCurent. pozprenume=elemDictionar; 
elemDictionar+ + ; 

} 

else 

elemCurent. pozprenume=flag; 

flag = lslnVocabular(listaStud[k].cnp,Vocabular); 
if(flag = = -l) 

{ 

AddVocabular(listaStud[k].cnp,Vocabular, VocabularEnd); 
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elemCurent.pozcnp=elemDictionar; 
elemDictionar+ + ; 

} 

else 

elemCurent.pozcnp=flag; 

flag = lslnVocabular(listaStud[k].localitate,Vocabular); 
if(flag = = -l) 

{ 

AddVocabular(listaStud[k].localitate,Vocabular, VocabularEnd); 

elemCurent.pozloc=elemDictionar; 

elemDictionar+ + ; 

} 

else 

elemCurent.pozloc=flag; 

fwrite(&elemCurent,sizeof(pozvocabular),1 ,pfisOUT); 

} 

fclose(pfisOUT); 

pfisOUT = fopenCDateTEST4Vocabular.txt", "wb"); 
fwrite(&elemDictionar,sizeof(int),l , pfisOUT), • 

char caracterVocab = 

if(Vocabular! = NULL) 

for(vocabular *temp = Vocabular;temp! = NULL;temp=temp->next) 

{ 

fwrite(temp->element,strlen(temp->element),1 , pfisOUT); 
fwrite(&caracterVocab,sizeof(char),1 , pfisOUT); 

} 

fclose(pfisOUT); 


For each of the described routines there has been recorded a set of parameters, 
which are described in table 5. The developing environment of current software application 
is Microsoft Visual Studio 6.0, without using compiler specific optimization options. For 
measuring the processing effort of implemented solutions it has been used the Visual Studio 
environment profiler. 


Table 5. Parameters recorded for different data storage methods 


Output database 

Dimension 

(bytes) 

Vocabulary 

(bytes) 

Database 

(bytes) 

Save 

(mseconds) 

Load 

(mseconds) 

BDF 

520 

- 

520 

0.032 

0.039 

BD 

pa rotor 

504 

- 

504 

0.618 

0.124 

BD 

^^numeric 

448 

- 

448 

0.713 

0.121 

^^combinat 

418 

- 

418 

0.582 

- 

^^vocabular 

124 

299 

423 

1.235 

0.233 


From the table 5 values it is observed that the processing effort increase depending 
on the minimization degree of stored data dimension. Despite that BD^ocabuiar has a bigger 
dimension than the BD,-o„,binat one, in real cases, with a great number of data, the last 
solution will conduct to better results. 
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Conclusions 

Real world collectivities have defined descriptions accordingly to needed objectives. 
From this point of view it is important to define article structures that are enough flexible so 
that changes in objectives will not affect them in a radical manner. 

In the analysis phase, for each database and file there are developed new storage 
solutions, the designers' vision having an important impact on that. Taking into discussion 
and promoting new solutions there are defined the premises for further development of the 
creative spirit into the direction of adapting all optimization instruments, techniques, 
methods and algorithms for particular cases. The objective is to obtain numerous different 
solution for storing data in order to analyze them and to select the one that gives the best 
results. 

For each problem there are defined specific performance criteria and the 
procedures used to measure optimization effects, providing in this manner the base for 
variants comparability. 

As there is accumulated more experience regarding data storage optimization there 
will be obtained homogenous databases that have efficient storing techniques. 

Based on practical experience there are defined optimal storage procedure, 
specifying which storage method give best results for a database that has well defined 
characteristics. 
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